DOCUMENT RESUME 



FL 012 -97? 



Stansf ield, Charles. • ' , 

' Reliability and Validity of the Secondary Level * 
English Proficiency Test. 

82 *• : , " . ■ / ' 

.29p.. " .* ' . . 

'MF01/PC02 Plus Postage.^ 

♦English (Second Language); *Language Proficiency; 
♦.Language. Tests; Listening Comprehension; Multiple 
Choice Tests; Reading Tests; Secondary Education; 
Testing; *Test Reliability; *Test Validity 
♦Secondary Level English Proficiency Test 



ED 217,. 711 

AUTHOR 
TITLE 

PUB DATE 
NOTE « 

EDRS PRICE 
DESCRIPTORS 

i 

IDENTIFIERS 

ABSTRACT ' \ 

The secondary level English proficiency (SLEP) test"' 
is a group administered 150 item mjiltiple-test of English language 
proficiency that includes two subscores. and eight' different -item- 
types. It is designed to assess a foreign .st"!*^^ s readiness for 
English medium instruction at the secondary Jevel. This paper reports 
on two studies which were conducted duri/ng field testing of the' 
instrument; The results indicate high reliability for the total test 
(,.96) and for each subtest (.94 and .93). A validity study involved 
'analysis^ of test'scores and demographic data for U.S. public school 
students. The data grouped students according 'to citizenship status, 
length of time in school, length of time in the United 'States, length 
of English study within and outside the United States, and grade. The 
results ipdicat>e consistent grfcwth in the expected direction for 
subgroups established for* each variable . (Author) ' . 
V/ 4 I ' . ' • 



************** *****>******************* ^ 

* Reproductions supplied l>y EDRS ate the b^st that can' be .made . * 

* \ from the original document. * • * 
****************************** ****************************************** 



Q 



RELIABILITY AND VALIDITY OF THE' 
SECONDARY LEVEL .ENGLISH PROFICIENCY TEST' 



1 



. US. DEPARTMENT OF EDUCATION , 

NATIONAL INSTITUTE OF EDUCATION 
EDUCATIONAL RESOURCES INFORMATION 0 
4 CENTER (ERIC) 

^ This document has been reproduced as 
* received frortS the*, person or organization 

originating iij 
-S-Mmof^changes have been made to tmprove 
reproduction quality 

• Points of view or opinions stated in this docQ- 
ment do not necessarily represent official NIE 
position or policy 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED By 



TO (THE EDUCATIONAL RESOURCES 
INFORMATION CEhfTER (ERIC)." 



i * 



Charles Stansfield 
Educational Testing Service 
Princeton, New Jersey 




The Reliability and Validity of the 
Secondary level English Proficiency Test 



1 INTRODUCTION* . . * ' ' " ' 

\ » ' > 

. This\ paper is a report to the profession on a new test English as a 
second language, the Secondary^ Level English Proficiency (SLEP) test. ." 
The report is intended to meet two objectives: to provide information 
that might t>e of\se to item writers and others, interested' in various 
techniques for assessing language skills; and to provide additional 
information, not available elsewhere, which might be. of use in judging the 
overall validity of the tdk. Regarding the. second objective, we view the 
establishment of validity as^e marshalling over a period of time of , . 
evidence which would either support or would not support the use of a 
test for a particular function or^ra particular type of examinee. • * 
Thus, validity is basically a judgmental or .inferential matter, rather 



^bhan a purely empirical one. 



v». , 



, f ' 



2 The SLEP is a 156 item, four-option mum*plerchoice test of English 
language proficiency. The test provides aCto|al score and diagnostic 
subscores that measure ability in two primary , areas : understanding 
spoken English and understanding written English.- H^ncef or th, we will 
re f^5& t0 the sections that -measure these areas 'as the'Njlistening" and 
"reading" se<jtion's of the test. Each sectiok contains Totems, including 
the time required for listening to the directions and doing\sample items, 
Sectfbn One, listening, lasts 40 .minutes,; and Section Two, reading, lasts 
45 minutes. The total- time required to administer both sections\f the 
test is one' hour and twenty-five minutes T 



The SLEP was developed with support provided by .the TOEFL Policy 
Council, as a secondary school version of the Test of English as a Foreign 
Language. Published by Educational Testing Service (ETS), SLEP la designed 
for use as a selection or admissions Instrument by private secondary 
schools, or as a placement Instrument by public secondary schools. (1) 



HISTORY 



The history of the SLEP' dates back .to the mid 1970s, when ETS 
received frequent Inquiries from private secondary schools In the United 
States and abroad regarding the development of a lower ley.el version of ■ 
the TOEFL. In^ response, to the Interest expressed, In 1976 TOEFL staff 
sent -a questionnaire to 500 private domestic and overseas secondary 
.schools. The questionnaire sought information. on the schools' ne^d 'for an 

* * * ^ • 

English, language test "for selection and placement purposes, the English 
language screening procedures currently used in the admissions process,, ' 
and the degree of interest in a lower°level TOEFL. Over 60% of the 
v schools returning the questionnaire indicated support for the development 
of a secure English as a second language proficiency 'te'st for secondary 
schools. Subsequent contacts with officers of the '^ati^onal Association of 
Independent Schools (MIS) and. other knowledgeable persons indicated that 
the foreign student population in private secondary $choels*had doubled 
tween 1974 and 1978, and,- «s in universitfes, the , population was . ' 
continuing to grow. By 1978, over' 2,5 , 000 'foreign students were enrolled ' 
in private secondary s*hoo; S belonging to the NAIS Organization. ' 

,* In 1977, ETS staff accessed data On N thVdate of to'rth of 1976 'TOEFL 
registrants. It was found that 1,984 students between ages 12 'and 16 
had taken the TOEFL. These students were probably too young to enter • 



- . - -3- . ' • 

college^ and therefore it was assumed that their reason for taking 'the 

t TOEFL was related to a desire to be admitted to private secondary schools.' 

^ 'In October, 1978, ETS invited representatives of public and private 

secondary schools enrolling a large number -of international students to a 

meeting at which the possibility of developing a lower level TOEFL was 

discussed. Strong support was, expressed -for such a measure. Feedback 

indicated that the TOEFL, which emphasizes college level academic English, 

was too difficult for this grbpp of students and not adequately focussed 

on the kind of language they encounter. The indication w^s that listening 

and reading were^the communicative skills that should be assessed. Although ' 

some support was expressed fo* an actual measure of writing, it was 

assumed that th^s skill was sufficiently related to reading proficiency /- 

,that a separate measure-would not 'be necessary. The following month, the 

v « 
TOEFL Polioy Council approved a proposal to develop the SLEP. 



* TEST DEVELOPMENT 



Preliminary SLEP item types were developed by EfTS staff and submitted , 

f 

to members of a Committee of Examiners composed of six secondary school 

teachers of English as 5 a second language. During 197? meetings were held.. ] 

« 

to review potential item types and discuss. test spaci.flcat'iofls. Subse- 
quently, a fourteen page set of test specifications was developed. Early 
in the test development process,- it was decided to use a multiple-choice' 
item format. This, format he^>s insure score reliability through -the 
standardization of administration* procedures and 'also eliminates the 
need to rely on the subjective- judgment' of raters. - 'The choice of. material 
for the test was based on an analysis of actual textbooks designed 'for utee 
in American classrooms in grades 7-11'. "Regarding the social context of * : 



' -4- • 



• items, the committee decided to. present -situations representative of those 
encountered by students in American secondary schools. This design 
decision is particularly evident in the' conversations used' to test 
listening comprehension. " 

Eventually, almost questions were written and reviewed by ETS* 

test development staff and by the Secondary school ESL teachers. Following 
review, sixteen different pretests were administered to students, in order 
tp gather data on item performance. Over 6,6QO students in 30 secondary 
schools representing 12 countries in North America, Central America," South 

• America, Africa, Asia, Europe and ,the Middle Bast, 'took a form of the 
pretest. Subsequently, it was determined that 84 percent of the pretested 

iteiSs could be used in' operational forms of the , test'. Two such forms have" 
since been assembled. ' The second form is equated to the first through 
common item equating. Examinee score and item response data .from the »' 
first form are examined in depth below. 

v 4 

TEST CONTENT 

. » • K ' ' ■ . 

In .addition to the history' stated .already, each of the eight "item*- 
types selected* for inclusion* in tlie specifications and on the 'test will be 
described briefly in-t>rder,tp provide information that, can be used to 
judge" tlie content validity of the test. , 

. » Section One, the listening sectionT^cWaihs f<5ur* parts, each of 
which has one item type. . Part One requWs the. student to comprehend and. 
correctly identify a sentence describing a single picture stimulus. The • 
Student hears four sentences and- marks the ietter-Of the sentence that 
correctly describes the. picture! .The .SLEP; contains 25 such listening. ' 
comprehension items/ dealing with'cdrrect recognition of minimal" pair ' V 



- » -5- • • . . <» 

contrasts, juncture, , stress, sound clusters, tense yoice-,-prepositions, 
end vocabulary. . I ^ <' 

j Part Two consists of "12 items, based on a- map drawn In the test •• 

♦ booklet. The map. represents the downtown , area 'of *a small town, including 
buildings, parks, street names, etc., and depicts four cars labelled A, B, 
' C, and D. After listening to a brief conversation between two people, 
the student must decide in which car the conversation occurred.' \Each 
conversation discusses how the occupants of. the car will get from where 
"they presently are 'to where they want to gp. . (e.g., "I'd like to very 
much. If we continue on Mackerel to the circle and go around to Salmon,- " 
we can park on Cod Lane.") This part assesses an integrated variety of 
. linguistic, cultural, and pragmatic concepts. These include directions 
te.g., compass points — north, south,' east, west; turns — right, 
left, straight; street relationships), recognition of building names and 
association of appropriate vocabulary with the building (eig.;- snack - 
restaurant), distances, and time. Map items of listening comprehension • 
Are essentially pure, items, siijice *very little- reading is involved. 

Part Three of 2he listening subtest consists of 28 items 'based on- m 

0 ' 

s. — extended conversations • These c6nversatio k ns; representing typical secondary 

; y / e ^ * 

• school si tuatiofis, were recorded by American high-school students. #br 

\ « 

each recorded question, the student must choose one" of the four answers 
printed in the test book.. The conversations take place in various' parts 
of the school (e.g., cafeteria, library, study hall, counselor's office, 
nurse's office, etc.),. and deal with events that typi<^ally occur in each 
location (e.g.*, gymnasium - pep rally). The conversations, may' also d«*al 
^ with extracurricular activities, such as car washes, bake* sales', yearbook, 



. or sports- TypTcal high school academic subjects,, such as civics, geometry, 
and cifcrfint affairs, or non-academic matters suc'h as school closings,^ 
y > ^ vacations majr alsosbe the topics of th^s^ conversations • 
As can be seen, this and other parts of the test are" in no sense' "culture- 
free."^ Rather, a deliberate 'attempt is made to link the language tested-, 
to. an appropriate cultural milieu, thus,- the SLEP is a language proficiency 
measure baseH on the language thatf is likely to be encountered by a " 

m x 

: student attending high school in the United' States or an American type' 
high school overseas. < 

?art Four consists of 20 items involving what we have cali^d a 
multiple-choice dictation. This item type obviates the complaint (Lado, ' 
1961) that the dictation tests .too^many^ elements at once -and. overcomes the- 

■ problem of sUbjective^rading of responses.' T£e' student must match one of ' 

.four sentences printed iir the -test book, with a sentence heard on tape. 
• ' , . ■ ' •, . .... 

Many of the sentences are the type of utterance the student 'is likely to 

hear from the teacher, or classmates (e.g., "ShouldnV^u do your reading 
assignment, before answering the questions?"). The abstractors emphasize ' 
structural variations rather than phonological problems. All distractors 
are grammatically correct and none is merely", a rephrasing of ,i:he keyed. ' 

,distractor. We have ealled^his a. dictation "because it functions as a , 

dictation .psycholinguistically.. The student must retain the complete- ' ' 
thought In short tern memory, while he constructyhowlt should, be written. %d 

Thi*:process is^similar to writing^ the sentence without confusing or 

misinterpreting what Was dictated. 

Parts Five through Eight are found "in, Section Two, which, is the 

reading comprehension subtest. Part Five consists of twelve" items based . " 



on a single cartoon. The cartoon depicts a specific event, such as- a 
school closing due to a heavy" snow- storm, and the reaction of several' 
members of a family to' that event. Each family member 'is 'labelled with - ' 
the letter A, B,. G^r D in. the cartoon. The- examinee reads a series of 
stimulus utterances each of which is composed of two or three sentences,, 
/'and. indicates which member pf the family probably made each utterance 

/ , • : ■ i 

based on -the situation depicted. This par/t tests. the Aad'ing of short 
P 8 fit ^lescribe a Common situation using everyday vocabulary. 

The student sees four line drawings for each item in Part Six. After 
reading a single sentence, the student must indicate which drawing .the 
sentence describes. The items in this section test prepositions, adjectives 

• • • • y 

(e.g., words indicating quantity and size;, as well as the comparative 'and 
superlative forms), adverbs, pronouns (case), and other, words, that indicate 
-the- relationship between the. people or, objects jportrayed. . \ • - '* . 

Part Seven consists of 40 items biased on three multiple-choice cloze 
passages testing a wide- variety of grammatical- and' lexicai elements.- In 
fourteen of these forty items,* the student answers a' series* of reading ' . 
comprehension questions after the passage, based on the information contained 
in. the cloze passage. Thus, the cloze is very efficiently used as both a V 
test in itself f and the stimulus paragraph about which, a series of other * 

questions* are asked. . : M 

. • . . 

'Part -eight consists of eight reading comprehension items based on a . 
HO" word literary passage-. from a high school textbook. The examinee must 
comprehend and recall details of the pWage and make inferences as to 
the main idea,' tone, relationships between character*; events, the author's 
purpose,- etc. • 



Subsequent, to the assembly of the first 'operational .form,' the-' 
SLEP was administered xo foreign student's enrolled in -American public 
schoo^nd to non-native feglish speaking students enrolled to private 
schqolajoverseas in order to gather data on its. reliability and validity.. 
The information presented in the remainder of this paper is based on test 
performance and demographic- data produced by those two different populations. 

TEST ANALYSIS * ^. \ ' 

i 

\ln the first study reported here, the SLEP was of f ered- as an .alternative 
to the TOEFL to foreign secondary school students applyingj)for admission 
to American private 'secondary schools. It was given at'secure international 
administrations in 'January a>I.May, 1980; to a total of"*3l0 students in 25' 
countries. Prom this data, a standard scale was developed for equating 
scores across* forms? <and a statistical analysis of the test's overall 
performance was' carried out. The standard scale is based on the T-score, % * 
which has a 'mean of ^0* and a standard deviation' of 10. Because the raw 
scores gathered in j:he first two international administrations exhibit 
a ceiling effept, scaled scores range from a low^of 20 to a high of 67. ' 



Insert Table 1 about here. .1 

v • ^ 

JL~£ J '_ 



v 



Table I portrays the raw score^distributions for each of the eight 
par.tj, of 'the test. Middle difficulty is the midpoint between the maximum 
possible score and the score that would be expected if each item. were 
answered at random. For four-option tests' like the SLBP, middle difficulty 
corresponds to 62.5 percent of the maximum possible score. The table shows, 
that all but two of the pacts are easier ,than midd^ difficulty. 



, • % Part two, the multiple-choice dictation, was 4 the easiest part of 'the 
test.- The average performance, on this part'fox' all students tested was ' 
§0? correct. Thus; it appears that such items are generally' quite easy, 
although their' difficulty can be increased through the use of good 
distractors.^ 2 ) * / v. 

Almost equal in facility (89% correct) were the 'items based ;on a. 
. cartoon. One could argue -that performance on this part is reiated to 
familiarity with American family life. which it portrays. The data 
presented in Table I was, obtained from students living outside the United 
States. The fact that they performed so well on M indicates that the 
cultural referents in the cartoon do not interfere with the ability to . • 
answer the questions. "The items based on pictures (i.e.-, a 'single picture" 
v per item, four pictures *er item, and 'items based on a tora map), were 
•next in level of- difficulty, and were also relatively easy. 

The three most difficult "item types were those that did not contain 
pictures- (i.e., comprehension pf extended -conversation, cloze, and 
comprehension of a .Literary passage). Performance on such integrative 
items involves 'global understanding of the context as well as recognition »' 
of discrete elements of language^ . Since the SLEP is onVof the first 
standardized foreign language tests to use nfultiple-chbice .cloze 1 items, it 

examinees perforaed on these items. ^ 
• * 

♦ 

As can be seen in Table I, these items were of middle difficulty (622 
correct), which is the ideal difficulty level for maximu^ discrimination. ' 



RELIABILITY • ' ' 



The last row of Table I shows the *R-20 index of internal consistency 
reliability. This represents parallel-form reliability based on the 



. inter-item consistency for each part. '.Thus, iriter-Ltem consistency serves 
as a- surrogate- measure of. paraUel-fonn reliability^ thereby eliminating 
( the need to administer and correlate two different forms of the ' test. * As ' 
c%n be seen, the two experimental item types, multiple-choice dictation 
and multiple-choice cloze, "showed the highes/ reliabilities.. However, the 
cjoze procedure is the* moce efficient. Ninety-three percent, of pretested 
, cloze items^were usable/ whereas only "633! of jthe pretested' dictation Jtems. 
were usable. *For the SLEP, ' a usable item was one whose biserial correlation 
with total score for the section in which it 'appears is greater than .30. 
Such items-.are considered minimally efficient discriminators of language J 
Skills-.' 'A greater proportion ^of cloze, items, afe compared td "the Mother 
item types written by staff, demonstrated th*s criterion of discrimination ' 
power. This ^suggests that in the future test authors, would do well to 
consider cloze items for inclusion on standardized .secofad language tests. 

• ,* • * "' ' Insert Table II abput here. • - • » " ' 



Tkble II depicts some descriptive statistics by section for 'the' >ame 
sample of adolescent studerfte.- The data indicate -.that SLEP total score. and 
section scores -are highly reliable. ThlBisdue in part to the previously 

» * " • • » r . * 

* * < • * . * s 

mentioned -decision,; made at the test design stage, to utilise multiple- ■ " 

choice i£em formats, only. It is~felso due to- the test's length and item 

* * * ; 

discrimination power. The items iticlude'd on the final .form of the fest ' 

['■''■' ' * ■ ■ ' yy 

discriminate- well. The mean biserial correlation with section score for 

1 ! - • 

items in the listening sectio^ is .61. --For. items in the' reading section \ 

it- is ,55s ■ In spite /f its favorable i'tem discrimination power,* the 'test' ' 



remains relatively easy and is not significantly speeded, 'on the reading, 
comprehension section^ which is noft paced, 87% of .the students finished 

the test, and the mean number of -items not reached was. 1.0.- * x 

• * 
CONVERGENT AND pVERGENT VALIDITY • ' . ' * 

Table III shows the observed correlations between part scores and 

section scored for -the eight parts of the test. Interrelations among 

the four part* in section I range from >55. to .72 and. show a mean value 

of about .63; Thus, it appears- that the four parts are generally measuring 

different but related aspects of language. The dictation has relatively 

lower correlations with Parts Three and Four, of the section than it does 

withpart one This could be related to the use of single sentences in 

Parts One and Two, as opposed to the use 6f multiple sentence conversations 

in Parts Tfcree and Four. Each part correlates well with the section 

* (4) ' . 

score* N • . 



Insert TabJLe III a.bout here.^ 



\ The intercalations among the four pattfe in Section II were more 
disparate ranging from .36 to .77. Part Five, "containing items based on a 
•.cartoon, shows the lowest intercorrelations with the other parts and with 
the section score.. In part this is due to the fact that tlis part is less 
reliable than most of the others. Also, the. fact that it was quite easy • 
(X » 892 of * total), limiting the variability among scores, probably 
contributed to the dower correlations between it and .other parts of 
section II. Again, the excellent performance" of the cloze pVssage 
is noteworthy. Its observed correlation with the section score was 



. • - / ' 

-12- 

very high -(.90). Such' an impressive outcome suggests that a multiple- 
choice c'loze shows considerable promise as an overall measure of reading 
proficiency. * ' 

The ^relations, in Table III provide support fdr the conclusion that ' 
the eight parts of the test measure different but moderately interrelated 
aspects of the skill being assessed by the .test. In toto they yield a , 
correlation of .78 between sections, wlRch, given the high reliability of 
each, means that each section is measuring aspects of language acquisition ' 
that are, closely related but not identical. Thus, each contributes 'some 

v unique variance -to the total score, except for the cloze task, which for 
this sample functioned as a near perfect predictor 'of the reading compre- 

. hension score. 'V - 

. • ' . ■ \ 

CRITERION-RELATED VALIDITY " ' * \ ' . 

" \ ♦ 

Before the SLEP could be used in American .pubfic schools; i-t^vas. / 

necessary to obtain data on the performance of 'various groups of nonnativ^ 
English speakers enrolled in different public school programs. The 
data would make it possible to compare an individual student's performance, 
with the performance of other students "witn similar background character- • 
istics, and to use this information in determining appropriate placement 
in a remedial or mainstream program. In order to accomplish this\ a free 
administration of the SLEP was provided to, a large number of students ' 
and at the same time basic information on their background and current 
educational placement was' gathered. The following procedures were employed. 

In August, 1980, one thousand secondary schools located in or near 
large metropolitan areas were randomly selected to receive' a letter 
inviting each to participate in a validation study. Free test materials? 



14 



-13- 



scoring services and school score rosters were offered as an incentive 
to participate. ^Sixty-eight schools. representing 20 states responded 
affirmatively to the invitation. The SLEP was administered by local 
school personnel during t}he months of September and. Octdber -according 
to standardized procedures described in. the SLEP Supervisors Manual . 
Individual shipments of, test "materials also contained a supply of 
questionnaires to be filled out by students (See Appendix A). The 
questionnaire requested student responses to six •questioWegarding 
their visa status, enrollment in remedial or mainstream programs,^ and 
exposure to English., Of 'the L.,744 students who took the test, 1,239 " 
returned the questionnaire.' 



I . Insert Table IV about here. 



Table IV, depicts student responses .to' the question, "In 6 which type of , 
program are you currently e^olledT^udent responses indicate .consistent 
improvement in SLEP scores according to the degree of identified remedial 
\ instruction. Students receiving part-time remedial instruction performed 

. better than tjiose receiving full-time remedial instruction, and students^/ 
who were mainstreamed performed considerably better than those who were ' • 
receiving partial remedial assistance via special programs.. It should be 
remembered that the data presented here do not represent a single program 
but rather a total of 68 programs. No attempt was made to insure the 
^ . comparability of programs within each type. It is highly probable that 
■r programs vary considerably from school to school. Indeed, this is indicated 

by the standard deviation,- which is larger' than .night be expected if the 



ERIC 




proems were similar and homogeneous. .Also, .it should be remembered that 
• most schools do not simultaneously offer both bilingual education and 
English as a second language^ instruction, and that no information was. 
collected as to the number aid type of programs available in each school 
Therefore, it should not be included that students enrolled in g&i** 
programs are more proficient Ahan .students enrolled^^Sglrams of bilingual 
educ^rfon; Such a hypothesis was not, part^oTthls project. 

^-comparative score %a in Table IV are- also presented in the 
Manual for Administering SLEP along with comparative score data on other 
background variables .included on the- SLEP answer sheet and on the question- 
naire. Similar tables include means and standard deviations by age 
grade, visaYtatus, sex, and exposure to, English. This information 
will not be representedThere. Rather, we will use the data to gather 
some evidence for the^criterion-related 1 validity of the SLEP. It, wag- 
understood at the start 'of this process that the relationship between ^: . 
scores and criterion variables' would be limited due to numerous sources of ' 
unreliability, some of which* have already been mentioned. . 

• With these cautions in mind, the product moment correlations between 
SLEP scores and treatments were determined by receding a full-time treatment ' 
as a "1," ajfert-tine treatment as a "2," and a regular class placement as, 
a "3." The observed correlations between score and program assignment 
were moderate .(.33 for ESL programs and .57 for bilingual education 
programs). These correlations^ attenuated by numerous factors, including 
the unequal number of 'subjectsNsampled, the small number of placement 
categories, the lack of consistency, in programs across schools (part-time 
could mean 20 minutes .per week or three hours per day), the .lack of 



16 




.ERIC 



2 



-15- 



' v- reliability and validity in the placement procedures actually employed ' 
..(placement by surname, etS.), and the fact that the questionnaires were 
* (filled out by students with limited English proficiency. When viewed in 
/this context.^the predictive validity of the SLEP tentatively appears to 
-he good. Controlled studies within individual schools, in which SfiEP.is 
administered concomitantly with other previously validated placement 
procedures, will be necessary in order to establish^ more accurate * ' 
estimate of predictive validity. 

Table V portrays the Relationship between student test scores and 
responses to other questit^in the questionnaire. Because* not all 
persons responded to each question, the N for eath criterion variable is 
indicated. The data indicate that all. variables ate significantly related 
to SLEP scores, although the strength of the relationship varies according 
ttf the characteristic assessed. * 





* 




i * 


* 

Insert Table 


V about here. 


\ Year ^ 


of English 


study' (1, 2, 


• 

'3, 4 or more) showed the strongest 


correlatib 


n with SLEP 


score. This 


information was indicated by students 

It y 



on the SliEP answer 'sheet. This includes instruction both within and . 

/ » 1 ** 

outside /of the United States.' The correlation with total score was .41. 
The number of years of English study within the United States, as indicated 
in thr£e response categories, was more" strongly correlated with SLEP score 
than lias the amount of English study outside of or prior to 'coming to the 
United States. This suggests that for a cross-section of nonnative 
stu/ents enrolled in United State* public schools,. the formal classroom 



17 

1 jfl 



\ 



\ 



instruction received before arriving here. plays only a small role in ' * 
explaining actual overall language' prof icieipy. Time spent studying / 
English subsequent to arrival is a more effective determiner of actual or 
current proficiency as indicated by SLEP score.(r «\34), as^ is time, spent 
in the United States (r - .35). The length of enrollment i] the current 
school is also related to SLEP score (r = .25), although not as strongly 
as other^ variables that are associated with time spent in the United 



.States. 



In general, the data indicate that the listening comprehension score 
correlates, slightly more strongly with each criterion variable than does 
the reading score. This is not surprising 'since listening comprehension 
involves skill in understanding spoken English. . This skill is more easily 
acqi^red in an immersion environment, such as" that represented by resident 
^t^l^ th±n the Unlted State8 ' than ln a'foreign language environment. 
Thus, listening comprehension is more sensitive to the variables assessed, 
here. In this respect, and given ithe high reliability (.94)' of this, 
subscore, it may be permissible 4n some contexts to utilize a short-form " 
of the SLEP consisting of the listening comprehension section only. On 
the other hand, N ^ich a recommendation should be considerea^w^cautidn 
since the criterion variables utilized here are not those associated with 
success in school (e.g., grade point average). 

It should be cautioned that all <of the correlations presented here - 
are* at tenua^dj, since the number^of* possible responses wa^ restricted to 
between three and six. It is likely that SLEP scores would show higher 
correlations with, residence and Schooling -in the United States if a 
greater number of response categories were available. Since only a' few 



.fS- . 



categories were used here, these low to moderate correlations must stand 

' ■ i 

as modest indices- of the criterion-related validity of the SLEP * 
^ ' . - 

This article has attempted £o prbvide additional data which could be 

♦ 

used to evaluate the reliability and" validity of the SLEP, and data on the 

'* #> 
performance of the various item-types used. It presents information not 

" - e . * * - 

reported in the test manual and.further analyses of existing data. At * 

this time it is fair to conclude that SLEP scores? both part ' and total, ' 
are highly reliable. The content validity of the test is good, 'particularly 
for English as a second language students enrolled in grades seven, through 
twelve. The construct validity" also appears to be good, since the analysis 
indicates that the parts and .sections are measuring different but inter- 
related 'aspects of language proficiency. While favorable evidence of 
criterion-related validity was presented, additional research is needed in / 
this area. This research should include studies at the district level of 
the\SLEP»s ability to predict teacher ^placements, as Jell as placements * ] ■ 

determined by local instruments or procedures whose validity has already 

., ' ♦ . • 

been established. .In addition, SLEP Scores should be examined for their 

'• - * ' > '\ • ,; * 

relationship to grades earnej^dtn mainstream classrooms, and wi£fr scores on 

local and. national achievement, tests. Finally, the SLEP scores of native • 

speakers at different grade levels should be determined in order to gain a 

conceptual understanding of a "native speaker 'level" of performance, and 

of the language skills differential that may still exist between nonnative 

and native English speaking students at any^given grade level. 



NOTES 



\ 



'I wish to acknowledge the contributions to this Report of my colleague 
Francean Meredith, who supervised the test development process; Ann 
Angell, who performed- the statistical analysis of the data obtained 
from the first two administrations of the test and obtained the 
part-section correlations; NaW Turner, who performed .the correla- 
tional analysis of the public fcchool data; and' Paul Angelis. who 
provided overall direction for the test project during 'its ''formative. '* 
years* Ann Angell also made helpful comments on earlier versions of 

I 

the manuscript, as did Gordon Hale, Gay MacQueen, and Russell Webster-. 

o • t " 

DiFiore (1980) analyzed 56 SLEP pretest dittation* items provided by '« 

ETS. ih^i terns that functioned best, had distractors that Resembled 

their keys in four, areas*? word position, syntax, semantics, and 

phonology. Good distractors used the same word as* the key at the J 

' " : : • • ' ./ 

beginning and at the end of the sentence. They also used parallel*. 

syntactic structures. The following pretest item exemplifies this * 
phenometfon. < • > * 



i 



A. I wish Dr. Miller could tell me what' to do. r 1 

*B. I wish I could tell you where Dr. Miller is. 

C. I hopejyou can tel^ me who Dr. Holler is. ^ 

D. I thought I knew where Dr. Miller is. 

• * 

Thevcorrect response is B, *nd the 4 best distractor is C. E^ch 
sentence consists of .aSaain clause #nd two depfentfent clauses. 
However, distractor °A «nds with an infinitive in the third clause j 
and dlsf factor D Jacks a modal and £ direct object in. the second 



. Clause. Thus, their syntactic structure differs ffom that. of the • 
keyed option. Semantic similarities refer to the use of similar yet 
different elements within sentences,*such as in adverbial phrases' 
indicating destination. Overall phonological similarity, the degree ' 
to which the options si>und alike, <ls the most pervasive element in 
creating good distractors. "*•''* *i 

. . V * ' 

t 

(3) During item analysis, a printing error on one of the reading compre- 
hension items based on a cloze passage" was discovered in the test 
booklet. Although subsequently corrected, examinee responses to this 
item were not counted in the test "analysis' data, reported here'i . 

(4) These correlations have been corrected for spurioiisness. ' Correlations 
between scores having ^ems in common, such as .the .part score with ■ 
the total score, are spuriously high. A statistical correction has " 
been made for this effect. * . 



r 



Table I 



, Statistics on Difficulty and Reliability of 

" V ' - 'w/flKfX, ' , SLEP Item Types 

(Based -onf^jEorfiign Students Entering Grades 7 through 11.) 

. . ^r^j.^?;, . ...» 
» V.J?».. * •' '■ 

1 • IS — 



Item Type 



Section 

Number of • 
Items 

Mean Percent 
Correct^ A 

— ^ 
I&ltab^lity* 



Listening Cartoon 
Dictation 



"ShUsable Pre-, 
tested 'Items 



LC 

.20 

90"? 
.889 , 

65 



RC 

» 12 

'89 
.742 

* 79' 



Single 
Picture 

tc 

25 



85 
'.811 

75 



4-Picture 
Options 

RC 



15 

*78 
.707 

75 



Map . 
. LC 
/ 12 

74 
.,►721 

46 



Extended Cloze Lifcerary 

Conversation Passage 

J- -* • 
RC . RC 



LC 
18 
68 

.840 

88. 



40 ' 

62 
.891 

93 



' 8 

48 
<685 

i 

1 .88 



i 

to 
o 
I 



*Kt*der-Richard6on Formula *20j ■ 



22 \ 



\9 
5 



23 



Table II 

i * 

SLEP Descriptive Statistics by Section 
x * ' (N - 310) 



~ Listening 
Comprehension 



Mean Percent' Correct 

Mean Scaled Score 

Mean R-Bi serial of 
♦Items With Section 
Score • 



*Kuder-Richardsdn Formula 20. 



-22- 



^ Table III . 
■ Interrelations of SLEP Parts and Sections 




Past' 

Picture 
2. , Dictation 

i 

'3. .Hap'. » 

1 i 

4. Conversation 

5. Cartoon 

6. -Jour Pictures 

7. Cloze 

8. Literary Passage 



.67 
».64 
.72 



.55 
.55 



.4 



.66 



.56 
* .53 
.36 



.77 
.56 



.62 



Section , 
• .84 
.78 
.75 

4 

.82 
.61 
.80 
.90 
.67 



C 



v.* 



T. 



♦ # 

r 



ERIC 



.25 



-23- • ' 



1 ^Table IV 

' * • * * * 

SLEP Mean Scores by "Instructional Program 



- Program* 




•••• % 


' LC 


. RC < . 

— ■— « 


. . Total" . 


S.D. 


Bilingual. Education, 


Full-Time 


- 71 . 


v < 
15.1 , 


16.5 


. 31.6^ 


.'. 8.6. 


Bilingual Education, 


Part-Tiuje 


'85; 


17.9 


- 18.9 • 


r 

36.9 


11.2- 


ESL, Full-Tixne 


0 

* 


159 


: 18.5 


37. 5 


,9.6 


ESL, .Part-Time 

> 




694 


"21. :5 


21.4 


. 42.9 

t 


11. Q 


Regular 




211 


25.3 


25? 1 


50.4 . ' 


12.1 


^No resp v on&e « 19. 


■ \ 




s ■ 









' 4 



<o* 



■ r 



. -24- . 

* 

Table 4 V • 



Correlations Between SLEP" Scores and Five 



, . 4 Demographic Variables* _ 




-if i 


Listening 


Reading 


Total 


V, Years Enjglish Study 


1138 


.40 


.38 * ' 


.41. 


(Within, US) 


111.2 . 


.37 


.27 


.34 


\ V (OuCside US) 


992 

9 


.13 


.12 




1 Time in this School . 


1215 


V 

, .27 3 


.21 


% 

.25^ 


- , * Time: in US ^ 
• * * - . * 


1220 


.37 




.35 


„ ' *AM correlations are 


significant at the P < .0001 


icvei or xess. 

** i 














: v * * « fc ' - • 

* ' f . - * • 






*» 




v . -i 





















27 



REFERENCES 



^and'p^?? 110 , 10 ^ 1 A88 ° Clatl0n - ^ 74 - Standards for gducjtionaj 
SsoLat?on?^ fitR ' Waship g^t B.C. : African Psychological 

< 

^tlt ^SO^f'im^f^^c 8185 , Seco ° d ary Level English Proficiency 
? 3 Unpublished Statistical Report. Princeton, NJ: " 
Educational Testing Service. ' ' 

D± S? re ' 3 / V ' J 980 ' Specifications for a Listening-Dictation Item-Type 
. University of Florida, Unpublished Master's Thesia. item-Type. 

^""Sf'Jr"? 8 . 8 ": 1 "- ' :1979 ' SLEP Supervisor's Mann* J Princeton 
NJ: Educational Testing Service. 

Ed ?"5 l0n ;V e8 " ng ?r Vlce * 1980 ' Secondary Level English Proficient 
Test. Princeton, NJ: Educational Testing Service. ^BSL 

Edufcational Testing Service. 1981. Manual for Administering "sLEP . 
Princeton, NJ: Educational- Testing Serviced ~ 

Lado, Robert. 1961. Language Testing . .NewYnrV- McGraw Hill. , ' 



Appendix A 



SECONDARY LEVEL ENGLISH PROFICIENCY TEST 
Student Questionnaire 



^SST^SS^tS^S^^ th, ' llnM **ov .»ctly you did On yo „r 



NAME: 



(Family Name) 



(Flrit Name) 



X. Whet is your current student status? * 

a. [_] Foreign Student 

b. [_] Immigrant 

c. [ J Refugee 

d. f ] U.S. Citizen (Born in United States) 

[ I Non -Documented 



2. - In which of the following programs are you 
currently enrolled? * « 

a. [_J ESOL (English to Speakers of Other 

Languages) Program, Full Time 

b. [_] ESOL (English to Speakers of Other 

Languates) Program, Part Jlme 

e c * 1 1 ) Bilingual Program, Full Time 

<*• I 1 Bilingual Program, Part Time 



•• t_] Regular Class (witn native English 
, speakers) 



" * BIRTHDAIE: / / 

(Middle Initial) . ^ - (tenth! "(DayT "7^0 

4, How long have you lived in the United States? 
*• t ] Less than 6 months 

* be [_] 6 months to 1 year ^ 
- c t^J More than 1 year, bu.t less than 2 years 
d * I — J than 2 years, but less than 5 years 

e. [ — ] More, than 5 years, but not all my life 

5. How lone>have you studied English in the 
United'States? ■ ~ 

, J than 1 year ^ 

b. [_J More than 1 year, but less than 2 years 

c. [J More than 2 years 



How long have you been enrolled at this 
school? 

*• ( ] Less than 6 months 

be [_] 6 months to.l year 
c# £ LMore than 1 year 



6. How long have you studied English outside the 
United States? 

*• t J^Less tHan 1 year 

b. [ — ] More thai 1-year, but less than 2 years 
c# LJ tore than 2 years 



•fJSLifsz? M.::vT.n^^u°noJ e or:Jve: ti^ your " ores ob ihe second ^ 

inforaetioi you pro^Se by en^S thUJ Lti™ !?i k ^° a ! !' r ° Ut SCh<>o1 ° r any other 8ch ° o1 - 
idwtifiebl. recordlTvLl be^inSinM? <,uestion - tf111 b « u «! d ^research .tudieS end no individually 



